Synopsis: Naive Trees

Let's have an introduction to the antipattern for Naive Trees by learning its objective and uses.

Let’s suppose you work as a software developer for a famous website for science and technology news.

It’s a modern website, so readers can write comments and reply to each other, forming threads of discussion that both branch out and extend deeply. You want to track these reply chains and choose a simple solution for it—that each comment references the comment to which it replies.

Creating Comments table

It soon becomes clear, however, that it’s hard to retrieve a long chain of replies in a single SQL query. You can only get the immediate children or, at best, the grandchildren, up to a fixed depth. But the threads can have an unlimited depth. You would need to run many SQL queries to get all the comments in a given thread.

You try another idea: retrieving all the comments and assemble them into tree data structures in application memory, using the traditional tree algorithms that you learned in school. But the publishers of the website have told you that they publish dozens of articles every day, and each article can have hundreds of comments. Sorting through millions of comments every time someone views the website is impractical.

There must be a better way to store the threads of comments so you can retrieve a whole discussion thread simply and efficiently.

Objective: Store and query hierarchies#

It’s common for data to have recursive relationships. Data may be organized in a treelike or hierarchical way. In a tree data structure, each entry is called a node. A node may have a number of children and one parent. The top node, which has no parent, is called the root. The nodes at the bottom, which have no children, are called leaves. The nodes in the middle are simply non-leaf nodes.

An illustration of the tree data structure is shown below.

Tree data structure illustration

In the previous hierarchical data, you may need to query individual items, related subsets of the collection, or the whole collection. Examples of tree-oriented data structures include the following:

  • Organization chart: The relationship of employees to managers is the textbook example of tree-structured data. It appears in countless books and articles on SQL. In an organizational chart, each employee has a manager representing the employee’s parent in a tree structure. The manager is also an employee.

  • Threaded discussion: As seen in the introduction, a tree structure may be used for the chain of comments in reply to other comments. In the tree, the children of a comment node are its replies.

In this chapter, we’ll use the threaded discussion example to show the antipattern and its solutions.

Legitimate uses of the antipattern#

As will be seen shortly, the problem discussed in this chapter often utilizes Adjacency Lists. The Adjacency List design may be the appropriate method to achieve our application’s goals. The strength of the Adjacency List design is that it can retrieve the direct parent or child of a given node. It’s also easy to insert rows in Adjency Lists. If these operations are all we need to do with our hierarchical data, then Adjacency List can work well for us.

Do not over-engineer

Some brands of RDBMS support extensions to SQL to support hierarchies stored in the Adjacency List format. The SQL-99 standard defines recursive query syntax using the WITH keyword followed by a common table expression.

Recursive query syntax

Microsoft SQL Server 2005, Oracle 11g, IBM DB2, and PostgreSQL 8.4 support recursive queries using common table expressions, as shown earlier.

MySQL, SQLite, and Informix are examples of database brands that don’t support this syntax yet. It’s the same for Oracle 10g, which is still widely used. In the future, we can expect that recursive query syntax will become available across all popular brands, and then using Adjacency List won’t be so limiting.

Oracle 9i and 10g support the WITH clause but not for recursive queries. Instead, there are two types of proprietary syntax: START WITH and CONNECT BY PRIOR. We can use this syntax to perform recursive queries:

Recursive query syntax that Oracle 9i and 10g support
Solution: Create an Intersection Table
Antipattern: Always Depend on One’s Parent
Mark as Completed
Report an Issue